Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning
نویسندگان
چکیده
Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical valueto-value cluster coupling learning. Unlike existing embeddingand similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.
منابع مشابه
Context-Based Distance Learning for Categorical Data Clustering
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the d...
متن کاملDetecting Overlapping Communities in Social Networks using Deep Learning
In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...
متن کاملData Mining with Semantic Features Represented as Vectors of Semantic Clusters
Data mining with taxonomies merged with categorical data has been studied in the past but often limited to small taxonomies. Taxonomies are used to aggregate categorical data such that patterns induced from the data can be expressed at higher levels of conceptual generality. Semantic similarity and relatedness measures can be used to aggregate categorical values for cluster based data mining al...
متن کاملOn-Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking
I have investigated systems for on-line, cumulative learning of compositional hierarchies embedded within predictive probabilistic models. The hierarchies are learned unsupervised from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higherlevel knowledge re...
متن کاملA Novel Image Denoising Method Based on Incoherent Dictionary Learning and Domain Adaptation Technique
In this paper, a new method for image denoising based on incoherent dictionary learning and domain transfer technique is proposed. The idea of using sparse representation concept is one of the most interesting areas for researchers. The goal of sparse coding is to approximately model the input data as a weighted linear combination of a small number of basis vectors. Two characteristics should b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017